A systematic review of social network sentiment analysis with comparative study of ensemble-based techniques

Sentiment Analysis (SA) of text reviews is an emerging concern in Natural Language Processing (NLP). It is a broadly active method for analyzing and extracting opinions from text using individual or ensemble learning techniques. This field has unquestionable potential in the digital world and social media platforms. Therefore, we present a systematic survey that organizes and describes the current scenario of the SA and provides a structured overview of proposed approaches from traditional to advance. This work also discusses the SA-related challenges, feature engineering techniques, benchmark datasets, popular publication platforms, and best algorithms to advance the automatic SA. Furthermore, a comparative study has been conducted to assess the performance of bagging and boosting-based ensemble techniques for social network SA. Bagging and Boosting are two major approaches of ensemble learning that contain various ensemble algorithms to classify sentiment polarity. Recent studies recommend that ensemble learning techniques have the potential of applicability for sentiment classification. This analytical study examines the bagging and boosting-based ensemble techniques on four benchmark datasets to provide extensive knowledge regarding ensemble techniques for SA. The efficiency and accuracy of these techniques have been measured in terms of TPR, FPR, Weighted F-Score, Weighted Precision, Weighted Recall, Accuracy, ROC-AUC curve, and Run-Time. Moreover, comparative results reveal that bagging-based ensemble techniques outperformed boosting-based techniques for text classification. This extensive review aims to present benchmark information regarding social network SA that will be helpful for future research in this field.


Introduction
With the incremental growth of information technology and social platforms, user-generated information can easily be posted online, and this information contains people's sentiments and emotions toward a particular issue. While government, companies, and individuals are interested in retrieving the sentiments behind that reviews. Miserably, with the massive amount of data, it is challenging to polarize these comments and reviews. Where human experts are overpriced for labeling these reviews manually. Accordingly, SA is gaining a lot of popularity in research topics (Chen and Yang 2011). It is a broadly active method for analyzing and extracting opinions from text using individual or ensemble learning techniques. This field has unquestionable potential in the digital world and social media platforms. The vast content generated on the web is unstructured, which can be processed by the SA and converted into meaningful information. SA is the subset of NLP that combines computational linguistics, a rule-based approach, and machine learning for extracting the public's opinion from content provided on social platforms, including text, images, and videos. According to the requirement of a particular application, the problem of sentiment classification is primarily handled at aspect, sentence, and document levels. Aspect-based SA is known as the feature-level SA in which multiple features are extracted from the text reviews. Aspect-based SA provides a deep study of reviews and extracts the context of reviewers for a particular domain (Thet et al. 2010;García-Pablos et al. 2018). The aspectlevel approach mainly depends on the syntactic features of the text reviews (Che et al. 2015). Sentence-based SA approach works on finding the polarity for a particular sentence. Here, the various words are linked together to form a sentence and extract the polarity from that sentence N-Grams technique is used, which separates the words into pair of one, two, or maybe three. Sometimes N-Gram technique is failed to find the relationship between these words. Therefore, dependency tree and typed dependency have been introduced to address the word separation problem in text classification (Meena and Prabhakar 2007). In the sentence-level classification, each sentence is considered a separate unit and assumes that every sentence produces only one opinion: positive, negative, or neutral (Jagtap and Pawar 2013). Each document is considered a single unit in the document-based approach, and a single opinion is assigned for the whole document. The Bag-of-words approach is very popular and provides more accuracy in handling complexity in document-level SA (Bhatia et al. 2015). Most sentence-level applications try to achieve good accuracy in the whole document (Zhang et al. 2009). SA and opinion mining are two popular fields that help to calculate opinioned information from online social platforms. These are commonly reciprocal to present a similar meaning. However, some researchers are used them for handling slightly different problems. SA is used to detect the sentiment from reviews as neutral, negative, or positive, and opinion mining is used to analyze a text's subjectivity (Tsytsarau and Palpanas 2012). Previous research employed machine learning and heuristic-based methods very frequently. Heuristic-based methods mainly depend on semantic features and linguistic characters, whereas machine learning-based algorithms are classified into unsupervised, supervised, and ensemble learning.
Several articles have been published related to SA using different techniques, which generates a need for a deep study to summarize the trends and aspects related to SA. One comparative study and one detailed survey were also presented a few years back by Xia et al. (2011) and Giachanou and Crestani (2016) in 2011 and 2016, respectively. Xia et al. (2011) provided a comparative study of ensemble-based techniques for SA but did not cover the advanced ensemble approach of this field. Giachanou and Crestani (2016) presented an in-depth survey related to Twitter SA and summarized the previously proposed approaches of SA in Twitter. However, this survey did not implement any latest techniques for comparative discussion and did not explore the latest updates in this field. Here, we provide a detailed SA survey and present all the recent facts and trends related to this field. This study investigated the research work from 1996 until 2022 utilizing online repositories and tried to cover all the essential aspects related to SA, which will provide deeper information to upcoming researchers in a single manuscript. Extensive experiments have also been conducted on different domains to provide the best ensemble approach for the sentiment classification task-this analytical study was mainly conducted for sentencelevel SA using ensemble machine-learning techniques. Furthermore, experimented ensembles are categorized into two major categories; bagging and boosting. Accordingly, eight ensemble learners were implemented, where five belonged to boosting approach and three from the bagging approach. Figure 1 presents the summarized taxonomy of our social network SA survey.
Multiple learners learn together in an ensemble approach to get more accurate and efficient results than individual learners. Ensemble methods have been used in NLP applications and are proven better than a single method (Zhang et al. 2009). The Convolutional Fig. 1 Taxonomy of social network SA Neural Network (CNN) and Long Short-Term Memory (LSTM) models with averaging method generate better results than individual ones (Minaee et al. 2019). Although governments, businesses, and individuals are always interested in calculating the polarity and sentiment from the reviews, no consistent conclusion is available to prove which methodology is best for this process. Therefore to find conclusive results, this study compares eight ensemble techniques on four popular datasets to investigate the performance of ensemble models for SA. The main objective of this study is to explore the latest research on sentiment classification with a comparative analysis of ensemble-based techniques. Therefore, we explained five research queries.
• RQ1 What are the different approaches, publishing platforms, and benchmark datasets used by researchers for the SA.
To discover the most popular approach and dataset used in the field of SA. This would be helpful for the researchers to understand the current scenario related to this area.
• RQ2 What are the major challenges facing the researchers during sentiment calculation from text reviews. Discuss the challenges in the field of NLP with their proposed solutions. • RQ3 What are the distinct feature engineering techniques for selecting the essential features from text reviews.
To explain the various feature engineering techniques for dimensionality reduction of text datasets. Thus, many critical research papers have been collected from different publishing sites to map popular feature engineering techniques for text datasets.
• RQ4 What are the researchers' emotion theories to detect the emotions from the social content, including text, images, and videos.
To identify the common emotions that are present in prestigious theorist emotion sets. It would provide the best emotion set to future researchers for opinion extraction from the social content, including text, images, and videos. • RQ5 Which is the best ensemble technique for sentiment classification and future opportunities of SA.
To discover the best ensemble technique this provides the highest results in terms of all standard measures. Hence, various experiments were conducted on different domains to select the best technique of text classification. It would be helpful for SArelated applications. Future opportunities related to SA have been discussed.
The further sections of this study are categorized as follows: Sect. 2 presents the extensive literature survey related to the SA. Section 3 elaborates on the all-important aspects of SA. Section 4 describes the methodology used for the comparative study. Section 5 presents the comparative results and analysis. Section 6 discussed the future opportunities of SA. Finally, Sect. 7 generates the study's conclusion and addresses some needful issues for future research.

Literature survey
SA is extensively used to extract people's opinions, emotions, and sentiments toward a particular brand, business, place, or product. Various techniques and approaches are also introduced to classify the sentiments as the demand for SA increases. After analyzing the vast literature on sentiment classification, we have concluded that SA can use five significant approaches. Figure 2 presents the classification of all the major approaches used by researchers for sentiment classification.
First, the lexicon-based approach uses a manually or automatically-generated list of various positive, negative or neutral polarity terms for sentiment classification. The lexicon approach computes the semantic orientation of phrases and words in sentences and documents to reveal the sentiments. Usually, the lexicon-based approach uses adjectives to indicate the semantic adjustments (Taboada et al. 2011). Second, the machine learning approach is a widely adopted technique for SA. Most researchers preferred a machine learning-based approach for sentiment classification due to their fast execution and reliable results. Machine learning provides various single learners, namely Naïve Bayes (NB), K-Neighbors (KN), Linear Regression (LR), Support Vector Machine (SVM), and so forth. Third, the graph-based approach selects the nodes and vertices based on the feature (reviews and tweets) available in input materials. Various graph-based models such as Enterprise Graphs, Hyper-graph, Hashtag Graphs, N-Gram Graph, and Co-Occurrence Graph are available for effective SA process (Krishnakumari and Akshaya 2019). Fourth, the ensemble approach combines multiple weak learners to form a powerful learner. Various ensemble learners, namely Random-Forest, Extra-Tree, Meta-Estimator, Ada-Boost, Gradient-Boosting, Light-GBM, Cat-Boost, and Extreme Gradient-Boost, are available to make the sentiment process more effective than the lexicon approach and single machine learning approach. Fifth, the most potent Hybrid approach that enhances the capability of sentiment classification model with the integration of machine learning and lexiconbased approach or with the combination of multiple machine learning algorithms. A hybrid approach is a novel idea that the researchers present to build a more prosperous and robust model for solving a particular problem. The researcher performs various experiments with discriminant techniques on specific data and tries to create a more effective model than a single and ensemble model. For example, linguistic dictionary and SVM were combined to build a hybrid model for political tweets sentiment classification that acquired 93% accuracy for sentiment classification, which is significant enough and beneficial for politicians to make strategies for future elections . Here, we categorized all the previous research into two parts: Sentiment Analysis (SA)-which studies the Fig. 2 Classification of proposed SA approaches subjective information in the text, and Sentiment Classification (SC)-which identifies the opinions from the text and assigns a particular label to them.

Lexicon-based approach
Phrases and opinions implement lexicon-based approaches without prior knowledge of labels. Here, collective phrases are treated as an opinion lexicon along with negative and positive words. Opinion lexicons determine the orientation of the terms available in the text dataset. The lexicon-based approach is categorized into two parts; the Dictionary-Based approach-judges the sentiment based on phrases available in lexicons, and the Corpus-Based approach-extracts the context present in the text. Table 1 reports the list of lexiconbased research from 2011 to 2022.

Machine learning-based approach
Machine learning is the most promising approach for SA. Usually, machine learning-based SA provides a high accuracy score than the lexicon-based approach. It offers various feature engineering techniques that extract the critical features from the dataset and improve the efficiency of SA. Different supervised and unsupervised algorithms are available for sentiment classification. The supervised approach works on labeled datasets and uses a mapping function to map the input labels with output labels. In contrast, unsupervised learning learns the pattern from unlabeled data using clusters. Table 2 presents a few popular pieces of research related to machine learning-based SA from 2011 to 2022.

Graph-based approach
A graph-based approach connects interrelated words in text reviews to calculate the sentiment and opinion of people where vertices and nodes conform to features available in reviews. Various graph-based methods and algorithms have been applied in the last decades to solve the problem of SA. Table 3 visualizes the multiple pieces of research that have been done in the area of SA using graph-based methods.

Ensemble approach
Ensemble learning is a process of combining several learners strategically to form an intelligent model. It improves the classification problem by reducing poor and unfortunate selection. It has the capability and knowledge of various learners, which increases the accuracy of a classification and decreases errors in prediction. Decision-based on diverse learning makes ensemble learning more accurate and trustworthy than single learning. Table 4 presents the research work done in SA using an ensemble approach from 2011 to 2022.

Hybrid approach
The hybrid approach utilizes the capability of various approaches such as rule-based, lexicon-based, machine learning, or deep learning-based. It enhances the efficiency of the SA model with optimum results. It is an idea that generates in a researcher's mind to develop the best approach for a particular task. Hybrid learning is categorized into semi-supervised Table 1 Lexicon-based approach for SA from 2011 to 2022 Multi-instance learning does not contain individual labels; instead, it receives labeled bags, and each bag has various instances, which explicitly treats the problems with incomplete knowledge of training examples. Self-supervised learning generates labels by itself and utilizes supervised algorithms to solve unsupervised problems. Hybrid models show significantly more improvement in classification than other methods. Table 5 presents work related to hybrid SA.

Extensive literature analysis
This section presents a deep analysis of literary work that has been done in the field of SA. Various graphs and tables have been used to discuss algorithms, datasets, approaches, and the most popular platforms related to SA. It has been employed in numerous real-life applications. Therefore researchers take more insight into it. Hence, this section focuses on various essential points that are required for further research in this area.

Growth in publications of SA
This section shows the growth in the number of publications of SA. As shown in Fig. 3, the number of publications related to SA was very few in starting years (2010, 2011, and 2013). As the demand for social platforms has increased, publications associated with SA have also emerged since 2014. In 2016, 2017, and 2019, numerous researchers have been proposed good research related to SA using machine learning, ensemble learning, and hybrid techniques.

Publication platform for SA
The total number of 92 documents from 2010 to 2022. We found 48 different journals for SA publications. The publications frequently occurred more than one time in our collection are reported in Table 6. Various conferences have also been scheduled for SA publication.  Bordoloi et al. (2020) 2020 ✓ Customer reviews Co-occurrence graph 0 Xia et al. (2021) 2021 ✓ Customer reviews Relational graph 11 Liang et al. (2022) 2022 ✓ Tweets and reviews Graph convolutional network 80 Table 4 Ensemble The journals "Elsevier" and "Springer" are two more common venues for SA publications. Where Elsevier, Springer, and ACM are three popular publishers that are chosen by researchers for SA-related authentic research. Additionally, it has been seen that several platforms are open for SA-related research.

Popular datasets for SA
A benchmark dataset plays a vital role in sound research. Figure 4 presents benchmark datasets used by researchers for SA. The researchers have used several resources, namely movie reviews, product reviews, Facebook posts, and tweets for SA. It has been seen in the graph that researchers more frequently use product reviews for their experiments. Secondly, Twitter gained more popularity among researchers for SA-related experiments. Few researchers also generate their datasets for sentiment classification. Whereas the researchers also consider movie reviews, medical reviews, and hotel reviews for SA-related experiments.

More favorable techniques for SA
This section provides knowledge about more frequently implemented methods for SArelated problems. Additionally, we explored more popular techniques related to the lexicon-based, machine learning-based, ensemble-based, graph-based, and hybrid-based approaches for SA applications. Table 7 presents the frequently used machine learning  algorithms by researchers for the SA process, where SVM is in the topmost position in the list. NB was also persistently used by the researchers, but SVM produced the most noticeable results for sentiment classification. Table 8 shows the frequency of ensemble-based techniques used for SA. It is observed that bagging and boosting are more common techniques researchers use for ensemble sentiment learning. The concept of majority voting has also frequently been implemented by researchers in different combinations of single learners. Table 9 presents the graph-based techniques for SA. There are many variations in choosing graph-based methods for SA, but word-graph and co-occurrence graph were used by two researchers in N set = 10.
A hybrid SA is very much in demand. So, we also surveyed various papers related to hybrid SA and categorized the hybrid work into five significant categories presented in  Naïve bayes (NB) 9 Maximum entropy (ME) 3 Tree 3 K-nearest neighbors (KNN) 2 Artificial neural network (ANN) 1 K-Star (K*) 1  Table 10. We found that in most of the hybrid work, researchers applied the combination of the lexicon approach and machine learning approach as it has been applied seven times in N set = 19. Whereas machine learning has been individually used five times, and a combination of machine learning and rule-based approaches has been used four times in previous work (N set = 19). The combination of machine learning with genetic and deep learning was found to be very rare.

Important aspects of SA
SA has been an exciting field of study since the 1990s; there are further various sub-fields for research. Merriam-Webster defined sentiment as a thought, judgment, or attitude that arises from feeling. It is an idea or opinion developed by emotions. This section presents the various essential aspects of SA.

SA challenges
SA is an emerging field, but it has various challenges, making it process-critical and decreasing the efficiency of related models. Although researchers are working to solve these issues using discriminant techniques, there is still a lack of accuracy. These challenges generate obstacles to extracting the correct meaning of sentiments and classifying the correct polarity. Common challenges of SA are mostly related to the language used in online social networks. Additionally, the words that regularly pronounce around us influence the words applied on online platforms. It is also noticeable that language used on social media is more malleable than formal words, including formal, informal, and

SA feature engineering
The number of N features increases the domain dimensionality of the datasets. Feature engineering is a very important step in SA applications and opinion mining. Feature selection and feature extraction should be intractable with final processing in optimal feature engineering (Kohavi and John 1997). This section provides information regarding various types of feature engineering techniques that have been previously applied for text preprocessing. Figure 5 depicts the process of feature engineering that completes in four steps: (1) Original Feature Set: This section holds the raw elements of the dataset that needed processing.
(2) Adding Weights: All the calculations are performed, and weights are assigned to the selected features by normalization and scaling methods.
(3) Feature Ranking: It is the process of arranging the features in specific order by the value of some scoring function. (4) Final Feature Subset: This represents the finally selected N number of features ready for the fact calculation . Dimensionality reduction reduces the high dimensions of the dataset that keeps more discriminative and constructive features from the collection set. Feature engineering is categorized into two major parts (1) Feature Extraction and (2) Feature Selection. Feature extraction is a process of selecting required or essential features from the original set. Principal Component Analysis (PCA) and Latent Semantic Analysis (LSA) are the two popular techniques of feature extraction (Zareapoor and Seeja 2015). At the same time, feature selection is a process that reduces the number of variables for predictive models. Effective and efficient feature selection improves the performance of SA. The feature selection process includes missing values removal, low variance removal, highly correlated feature removal, univariate selection, and recursive elimination. Feature selection methods are categorized into two groups: filtered methods and wrapper methods (Uysala and Gunal 2014).
At the same time, feature selection is a process that reduces the number of variables for predictive models. Effective and efficient feature selection improves the performance of SA. The feature selection process includes missing values removal, low variance removal, highly correlated feature removal, univariate selection, and recursive elimination. Feature selection methods are categorized into two groups: filtered methods and wrapper methods (Uysala and Gunal 2014). Filtered methods do not depend on learning models or classification algorithms and can easily apply quickly. Chi-Squared (CHI), Mutual Information (MI), Document Frequency (DF), Gini Index (GI), Information Gain (IG), and Distinguishing Feature Selection (DFS) are the filtered feature selection methods. In contrast, wrapper methods depend on learning models and follow the rules accordingly. Tabu Search, Genetic Algorithms, and Particle Swarm Optimization (PSO) are the wrapper feature selection methods. Figure 6 presents the taxonomy of feature engineering/dimensionality reduction.

Feature extraction
In the SA task, the reviews and documents hold million and billion of tokens that make the text classification process more complex. Feature extraction is a dimensionality reduction method that reduces the N number of dimensions from the dataset and  Bouazizi et al. (2016) proposed four types of features, namely "Sentence-Related" features, "Punctuation-Related" features, "Syntactic and Semantic" features, and "Pattern-Related" features to handle sarcastic tones differently. Joshi et al. (2015) presented word-embedding-based features and provided improvements over the previous four reported features Negation-handling Extracting the negation-affected patterns from sentences is a difficult task in SA. Negation handling is the process of reversing the polarity of phrases, words, or sentences where the scope of affected terms is not fixed. For example, "The movie was not entertaining," the scope was single after the negation word "not." Still, in another sentence, "I do not call this movie entertaining", the scope is not limited after the negation word "not". So, this is a challenging task for a system in SA  2020) provided an effective way to use the pre-trained BERT language model to handle the domain dependence problem and also proposed a novel domain-distinguish task for pre-training Huge-Lexicon Every day, many reviews, opinions, news, and blogs are posted on social media sites that contain a huge vocabulary, and handling this huge vocabulary (lexicon) is a very complex task in SA Kaushik et al. (2014) performed the SA using the Hadoop technique to handle many datasets Word-sense disambiguation Disambiguation is a problem identifying a word's sense because a single word can have discriminant meanings. The context of a sentence can control word sense disambiguation. For example, if the word "small" relates to the house, it reflects a negative sense, but it represents a positive sense for pet animals or other things. So, this is a challenging task for a system Yu et al. (2003) and Hu et al. (2004) initiated a lexicon dictionary where words were linked by prior polarity context. The presented polarity of a word in a phrase may differ from the preceding polarity of words because a word can reflect different meanings Anaphora resolution Although "Pronouns" play an essential role in accurate sentiment extraction, they are ignored most of the time. For example, "The television is good. It has a big screen and good resolution". In this example, television cannot refer to good without knowing the pronoun 'it', leading to the problem of anaphora resolution Ge et al. (1998) incorporate various factors of anaphora resolution into the specific statistical framework and maps the distance between the pronouns and proposed antecedent. Lappin et al. (1994) proposed an algorithm to identify precursors of third-party pronouns and lexical anaphors presents it in a more predictive and compact way (Gomez et al. 2012). The reduced set is easier to handle due to its size and contains only essential features for the process.

PCA
It is a popular technique to reduce the dimensionality of the dataset by converting the original attribute into a smaller unit. The purpose of the PCA has to derive  First, the mean of each feature is calculated in [Eq. 1] (Kumar et al. 2017). The mean vector of the column vector is N × 1.To treat the different attributes as on the same scale, rescaling of each coordinate has been done to get a unit variance [Eq. 2], then replace X(i) with X(i) ∕ j . After completing the preprocessing, the covariance matrix has calculated using eigenvectors with the symbol ∑ (Greek letter sigma) [Eq. 3].

LSA LSA is the latest dimensionality reduction technique and feature extraction
in text classification. LSA works on the procedure of analyzing concepts, terms, and relationships between unstructured texts. It can correlate semantically related terms of latent text. LSA is used for text clustering and page retrieval system. LSA resolves the problem of words with more than one meaning and various words containing similar meanings (Zareapoor and Seeja 2015).

Feature selection
Feature selection, select and remove irrelevant and duplicate attributes from the dataset that do not contribute to the predictive model's performance and accuracy. Feature selection contributes to improving the model's performance, develops cost-effective predictors, and provides more simple and reliable models. Feature selection is a powerful tool to simplify or speed up the calculations of the learning model (Dasgupta et al. 2007). Feature selection is further categorized into filter method and wrapper method.

Filtered method
This method allows diverse scoring techniques to access the relevancy of features independently from learning classifiers or models. These techniques are very scalable to high dimensions datasets and provide fast and simple computations (Guyon and Elisseeff 2003). Various filter methods are available for text classification and SA.
• CHI It is a popular statistical method of feature selection that estimates the feature independently by calculating the chi-square corresponding to the class. It analyses the dependency between the term and class. It calculates 0 for the independent relationship and 1 for the dependent relationship between term and class (Zareapoor and Seeja 2015). CHI provides the significance difference formation and provides the significance difference information between categories (McHugh 2013). ( The CHI [Eqs. 4] and [5] calculate the association between the features of the word and the associated class (Sharmac and Dey 2012). Here, A represents frequency when t = term and c i = class co-occur, B is a count while t appears without c i , E means events while c i appears without t , D represents frequency while neither c i nor t appears, and N shows entire documents of the corpus. The score of CHI will be 0 when t and c i are not dependent on each other. • MI MI presents the association or dependence between the two random variables.
MI finds the dependence between term t and class c. It describes the amount of information contained by a term for the associated class [Eq. 6] (Yang and Pedersen 1997). It is calculated as: Here, P represents the probability of term t , and P(t|c) represents the probability of term t of assigned class c . MI measures the much information is communicated on average from one random variable to another. P(t) and P(c) are the marginal distribution of t and c get through the marginalization process.
• DF This threshold is the most straightforward technique to reduce the vocabulary of text classification. It can easily scale the massive corpora with the linear computational complexity of training documents. It does not recommend an extemporary approach as a principled criterion for feature selection. DF represents the number of documents in which a term appears. DF follows the assumption that infrequent terms are non-descriptive for the predictions of categories (Yang and Pedersen 1997). This method continuously removes those features whose frequency has greater or less than the predefined threshold. • GI It is an improved version of the attribute selection algorithm used for feature selection (Alper Kursat Uysalab 2016). It works as a split measure for selecting the most appropriate splitting attribute in the decision tree. The simple formula is utilized to calculate the GI [Eq. 7].
Where, P t|C i shows the probability of term t for class C i , P C i |t shows the probability of C i presence in term t . M represents the number of class labels and P shows the proportion of i th class label. So, GI is the measure of anti-homogeneity hence the feature of minimum impurity is selected for the best feature split. • IG It is a feature selection technique that reduces the size of features by computing and ranking the value of attributes. It measures the presence and absence of information in terms of contributing accurate classification. IG provides a higher score to those terms that hold relevant information for text classification.
(4) CHI (t, c i It is a global feature selection metric that calculates only one score for a particular term [Eq. 8] (Alper Kursat Uysalab 2016). Where, M represents a number of classes, P C i probability of class C i , P(t) and P t shows probabilities of term t presence and absence, P C i |t and P Ci|t are the conditional probabilities of class C i . • DFS It is the latest feature selection method and global metric for text classification.
DFS selects distinguish features from the collection of sets and eliminates ambiguous ones based on predefined criteria [Eq. 9] (Uysalc and Gunal 2012).
Where, M represents total classes, P C i |t shows the conditional probability of class C i in the presence of term t , P t|Ci presents the conditional probability of the absence of t in C i , P t|C i represents the conditional probability of t for all classes except C i .

Wrapper method
The wrapper method uses a specific learning rule for feature selection tasks. The calculation cost of the wrapper method is high, and processing is slow. Wrapper methods are not usually preferred in SA and text classification due to their high price and slow performance (Baccianella et al. 2013). Wrapper methods are based on optimization concepts and intuitive search. Wrapper methods are used to find better features and reduce duplicate elements using cross-validation (Inza et al. 2004).
• Tabu Search It integrates learning techniques to evaluate only promising feature subsets. Tabu search generates better accuracy than a genetic algorithm, heuristic search algorithm, PSO, and an evolutionary search for text classification (Alper Kursat Uysald 2018).
Most of the classification calculates the accuracy, which is calculated in Tabu search [Eq. 10] (Mousin et al. 2016). After that, to get a more interpretable learning model, the selected feature should minimize [Eq. 11].
• Genetic Algorithms (GA) GA is an optimal random search-based feature selection method that works on the propaganda of biological science mechanisms. It follows the procedure of genetic evolution in biology that starts from the initial feasible population and after that, applies crossover and mutation (Lei 2012). GA is a promising way to handle conditional optimization problems and is used immensely for feature selection. • PSO It is used to select the most optimal feature from the collection set that provides the most remarkable difference between metallic particle classes in terms of their dimensions. PSO offers various advantages for powerful exploration. PSO has memory, inexpensive computation capability, potential population solution, address binary and dis-(8)

SA emotion theories
Emotion extraction and classification are essential parts of SA. So, here we introduce some types of basic emotions considered by the researchers in SA and classification. Here, we introduced a standard emotion set that is common in various research. Automatic human facial expression extraction is an emerging application of Human-Computer Interaction (HCI) and affective computing. Therefore, emotion extraction and classification became prime aspects in the research field of SA. Several researchers have been working on a distinctive set of emotions and expressions. Gunesa et al. (2005) present automatic emotion recognition from the face and body using early fusion and late fusion approaches. Their study performed on eight prototypical expressions; disgust, fear, anger, sad, happy, surprise, happy surprise, and uncertainty. Gunesb et al. (2008) used twelve emotions: disgust, fear, sadness, happiness, anger, uncertainty, anxiety, positive surprise, negative surprise, neutral surprise, boredom, and puzzlement for facial expression and body gesture extraction. Hablani et al. (2013) evaluated binary patterns for facial recognition of a person and classified their expressions according to seven basic emotions; disgust, fear, anger, sadness, happiness, surprise, and neutrality. Chen et al. (2013) used appearance and temporal motion features for facial and body gesture recognition. They classified the emotions into ten categories: disgust, fear, anger, sadness, happiness, surprise, anxiety, boredom, puzzlement, and uncertainty. Hayat et al. (2014) presented an automatic facial recognition framework with six basic emotions: disgust, fear, happiness, anger, surprise, and sadness. Table 12 displays a few recent sets of emotions that the researchers frequently used and their findings regarding visual, motion, and sound effects. These sets of emotions will be helpful for beginners to proceed in emotion mining. Figure 7 provides a better illustration of the emotion sets used by the researchers in their facial recognition works. According to Table 11 and Fig. 7, "disgust, fear, happy, sad, anger, and surprise" are common emotions used by different researchers.

Methodology used for comparative analysis
This section presents the methodology used for ensemble classification of the text reviews for sentence-level sentiment classification. The ensemble approach of machine learning has been used in various applications and has produced outstanding results. Ensemble learning is also approachable in the SA task. Therefore, we have presented a comparative analysis of diverse ensemble methods that are divided into two main categories: bagging and boosting. This study compares eight popular ensemble learners (Random-Forest, Extra-Tree, Meta-Estimator (Linear SVC), Ada-Boost, Gradient-Boosting, XGB, Cat-Boost, and Light-GBM) to choose the best model for SA. The experiments have been conducted on four different domain reviews: Uber reviews, Restaurant reviews, Amazon reviews, and Food reviews. Figure 8 presents the comprehensive structure of the methodology used for comparative analysis. Further sub-sections provide detailed information regarding the comparative methodology.

Dataset collection
Dataset collection is the initial step of every research, and it plays a crucial role in authentic experiments. Four leading review (Uber, Restaurant, Amazon, and Food) resources have been chosen to verify the authenticity of the experiments. Uber reviews dataset contains 1344 customer ride reviews, the Food category reviews dataset holds 25,000 records, Amazon product and Restaurant reviews dataset holds 1000 records for the experiment. Here, both large and small size of the dataset is collected for investigating the ensemble models that can provide better comparative analysis. The experimental dataset contains positive and negative reviews where positive sentiments are denoted by 1 and negative sentiments are denoted by 0. Table 13 displays the number of positive and negative reviews contain by all the datasets.

Data preprocessing
It is required to convert raw data into a machine-understandable form. First, we organized the datasets by rectifying the spelling errors, antonyms, and missing fields. After that, basic steps such as punctuation removal, whitespace removal, URL removal, number removal, and hash-tag removal have been made to clean the reviews. These preprocessing steps are needed to get an accurate score for SA because machine learning cannot work effectively on raw and grubby datasets.

Tokenization
Tokenization is a fundamental splitting phase in SA that partition the sentence, phrase, or paragraph into single words called tokens. Here tokens can be either character or word that is individually counted. Tokenization is the building block of NLP that is enforced by the n-gram approach. N-gram is a series of n items available in the text or speech. These can be categorized into unigrams, bigrams, or trigrams [Eq. 12].

Fig. 7 Taxonomy of frequently used emotion sets by researchers
where X denotes the total number of words in the sentence S , and the value of N will be 1 for unigram, 2 for bigram, and 3 for trigram. In unigram, sentences or phrases are split into the tokens of one word. In bigram, two words together are treated as a single token, and in trigram, three words together are treated as single tokens.

TF-IDF vectorization
Vectorization is the process of converting text into meaningful, informative numbers. It measures the frequency of a word in a document and generates a number accordingly. TF is calculated by the number of times an individual word occurs in a document divided by the total number of words in a document.  IDF is used to assign the weights to rare words in the documents. TF-IDF is calculated in [Eq. 13]. Where N represents the total number of documents, tf ij is the total number of i in j , and df i is the number of documents contained by i (Term Frequency xxxx).

Ensemble techniques
Machine learning supports two types of ensemble techniques bagging and boosting. Bagging selects the random samples from the training set and trains multiple learners Parallelly. In contrast, boosting collect the samples from the output of the previous learner and trains them sequentially. This section describes all the experimented ensemble techniques implemented for comparison. These algorithms are divided into two parts bagging and boosting. Wherefrom the bagging concept, we have selected a Random-Forest (RF), Extra-Tree (ET), and Meta-Estimator (Linear SVC) (M-SVC) for the implementation of SA, and from boosting approach, Ada-Boost (AB), Cat-Boost (CB), Gradient-Boosting (GB), XG-Boost (XGB), and Light-GBM (LGBM) were implemented.

Bagging ensemble approach
Bagging combines homogeneous classifiers and trains them parallelly with random samples. First, multiple bootstrap samples have been created that act individually. After that, base learners are fitted on them, and finally, their outputs are aggregated. Bagging is a popular ensemble approach that helps to reduce the variance of classifiers. Table 14 illustrates the procedure of the bagging ensemble approach (Polikar 2006

Random-forest
It is a powerful technique to handle large datasets quickly. Various applications have used it for accurate and effective results. Random-Forest constructs the multiple decision trees that classify the new instance by majority voting. Each node of the DT uses a randomly selected sample from the whole original sample set. We can say that every tree uses a different bootstrap sample, the same as the bagging concept. It follows a few steps: calculates the node importance of a tree. Where ni j represents the importance of node j , w j shows a weighted number of samples, C j shows the impurity value of node j , left(j) is the left node, and right(j) is the right node. Equation 15 calculates the importance of each feature on a decision tree. Where, fi i represents the importance of feature i . Equation 16 presents the normalization of these nodes. Finally, Eq. 17 shows the averaging method of all the trees. Where RFfi i shows the importance of feature i calculated from all trees, normfi i represents normalized importance of feature for i in tree j and T is the total number of trees (Random-Forest. xxxx).

Extra-tree Highly Randomized Trees
Classifier is an ensemble method that aggregates the output of multiple decision trees. It is highly similar to the random forest but only differs in DT construction in a forest. Table 15 presents the splitting process of the extremely randomized tree (Geurts et al. 2006).
Every DT of the Extra-Tree forest is formulated from the attributes of the original sample set. Then each individual node of the tree uses the random k feature of the sample, and each DT selects the best split for the creation of multiple de-correlated decision trees. Every DT calculates the entropy [Eq. 18] and information gain [Eq. 19]. where, c represents a number of labels (class) and p i is the proportion of rows. Extra-Tree classifier has simple properties, explicit meanings, and easy conversion of "if-then" rules (Sharaff and Gupta 2019).

Meta-estimator (linear SVC)
Bagging ensemble meta-estimator provides an option to select own base learner for the bagging process to reduce the base estimator's variance, e.g., a decision tree. Here, we have chosen Linear-SVC instead of DT as a base classifier for the bagging process. Linear SVC finds the hyper-plane space between two classes. It provides faster execution of large datasets and minimizes squared hinge loss. First, we built several substances of Linear SVC on random subsets of the original training set. After that, it aggregates the individual classified results of all substances to form a final classification.

Boosting ensemble approach
Boosting is an ensemble learning approach that boosts the performance of weak learners by sequentially running on multiple subsets of the dataset. Boosting constructs a sequence of models, and each model trains by considering the ambiguity of the previous model (Freund and Schapire 1996). Most ensemble techniques have identical statistics sets for training while boosting has different statistics training sets altered by previously trained models (Drucker et al. 1994). Table 16 presents the flow of boosting ensemble approach (Torelli and Menardi 2008). (1996) widely used in various applications. It boosts the performance of weak learners by converting them into stronger ones. Table 17 depicts the process of Ada-Boost learning (Bahad and Saxena 2020). Ada-Boost can train with any machine learning algorithm but is majorly applied with decision trees as these are very short and generate only one decision for classification. In this, trained models are sequentially added with weighted training data. Ada-Boost supports the concept of adaptive boosting, where weights are assigned to every instance, but higher weights are assigned to misclassified cases. The output is calculated as [Eq. 20].

Ada-boost It is the first boosting algorithm introduced by Freund and Schapire
where f m represents the m th weak classifier and m is the assigned weight. It generates the weighted combination of M weak classifiers.

Gradient-boosting
It is a powerful approach to building predictive models that generate additive models by statistically fitting parameterized functions to the current pseudo-residuals at each iteration of the model. The pseudo-residual is a gradient of the loss function that has been estimated on every present step. Respectively, at every iteration, a random subsample (without replacement) from the training dataset is drawn for base learning, which improves the execution speed and approximation accuracy of gradient boosting substantially (Friedman 2002).

Cat-boost It is the latest ensemble technique that can incorporate deep learning
techniques and work with discriminant data types to solve a wide range of problems. Cat-Boost is made with the combination of two words, "Category" and "Boosting," where category means it can work with varieties of data such as text, image, audio, or video, and boost means that it is a variant of gradient boosting ensemble. Cat-Boost resolves the exponential expansion of the feature combination generated by the greedy method at each split. Cat-Boost first divides the dataset into random subsets, then converts the labels into numerals, and finally transforms the category features into numbers [Eq. 26].
Here, CountInclass represents a number of ones in the target for given categorical features; totalCount presents previous objects, and prior shows starting parameters (Meng et al. 2016). Table 18 presents the Cat-Boost learning process (Nguyen et al. 2018).

Extreme-gradient boost (XGB)
Tianqi Chen introduced XG-Boost to improve the performance of Gradient-Boosting. It includes a wide range of tools under the guidance of Distribute Machine Learning Community (DMLC) that can efficiently work with various interfaces. XG-Boost constructs different ensemble trees sequentially for ensemble learning and assigns weights to each value of the database, which decides the probability of getting selected for the next decision tree. The initial weight of each data value is the same, and it updates according to the further analysis of decision trees. The result obtained by the first DT helps to construct a new classifying model [Eq. 27], and this process is repeated repeatedly until the construction of the final model.
Here, D is an ensemble model of a tree which applies K additive functions [Eq. 28] to predict the output.
Here, F in [Eq. 29] is a defined space, which is a part of regression trees, and q presents the tree's structure.T represents the number of leaves of a tree, and f k corresponds to the tree's structure.
[Eq. 30] minimizes to provide information about the set of functions used in the model. The difference is measured between target y i and predicted yi.
[Eq. 31] presents the additive training process of the model. f t improves the model's accuracy by optimizing the objective, and g i in [Eq. 32] is second-order statistics related to the loss function.
The constant function can also be removed for obtaining the following procedure presented by [Eq. 33]. This method is complicated in terms of depth. Hence, boosting trees generates high variance and low biased results. In contrast, random trees generate high bias and low variance in results because the model has a better ability to fit on the dataset (Bhati et al. 2020).

Light-GBM (LGBM) It supports the Gradient-Boosting framework, which
increases the efficiency of the model with light-weighted decision trees. It includes Exclusive Feature Bundling (EFB) and Gradient-based One Side Sampling (GOSS) techniques to overcome the limitation of the histogram that is primarily used by all Gradient-Boosting-based algorithms. Light-GBM is a variant of Gradient-Boosting, which inherits predictivity and resolves its scalability problem and long computational time using a leaf-wise growth scheme (Zhang et al. 2019). Light-GBM finds an approximation function to minimize the value of loss function [Eq. 34].
Then integrates the various T regression trees for approximating the final model [Eq. 35].
After that, Light-GBM trains in the form of additive approach at step t [Eq. 36] In Light-GBM, the objective function is approximated continuously with Newton's method. The formulation is transformed in [Eq. 37] after removing the constant term in [Eq. 36].
where h t and g i present first and second-order gradient statistics of the loss function. Let I j represents the sample set of leaf j and [Eq. 37] transformed as [Eq. 38].
For q(x) tree structure, w * j presents the optimal weight score of each leaf node and extreme value of could be formulated as [Eq. 39].
Here is the scoring function that measures the quality of the tree q structure [Eq. 40]. Finally, after adding the split objective function is as follows: where IL and IR present the sample set of left and right nodes, respectively, Light-GBM trees grow vertically, unlike other Gradient-Boosting techniques, making Light-GBM more effective for processing the various features and large datasets.

Comparative results
This section presents the comparative results of eight ensemble techniques (Ada-Boost, Gradient-Boosting, XGB, Light-GBM, Cat-Boost, Random-Forest, Meta-Estimator (Linear SVC), and Extra-Tree) on four popular reviews (Uber-Reviews, Restaurant-Reviews, Amazon-Reviews, and Food-Reviews) datasets. The experiments were conducted on a PC with Intel(R) Core (TM) i5-8265U processor, 4 GB RAM, 64bit operating system, and Windows-10 using Jupyter Notebook. All the datasets are partitioned into two parts, 80% for training purposes and 20% for the testing set. The standard measures, namely TPR, FPR, accuracy, weighted precision, weighted recall, weighted f1-score, AUC-score, and run-time, were adopted to check the performance of each ensemble model. The definition of all the employed measures is initiated with a confusion matrix, as presented in Table 19.
• Accuracy It is simply a ratio of accurate prediction to the total predicted observations [Eq. 42].
• Weighted Precision It is a ratio of correctly positive predictions to the total positive predicted observations [Eq. 43].
• Weighted Recall It is a ratio of correctly predicted positive observations to the total actual observations [Eq. 44]. • Weighted F1-Score It is a weighted average score of precision and recall [Eq. 45].
• ROC-AUC It stands for the area under the Receiving Operating Characteristics Curve that measures the capability of classification technique to differentiate between the classes. A higher AUC score presents better classification, and a lower score shows inaccurate classification. The ROC-AUC curve plotted based on True Positive Rate (TPR) = TP/TP + FN on the x-axis and False Positive Rate (FPR) = FP/TN + FP on the y-axis (Bichitrananda Behera and Kumaravelan 2019). Table 20 reported the TPR, FPR, and run-time values of eight ensemble models. Accordingly, GB obtains the highest TPR value, 117.6, for Uber reviews. ET receives the highest TPR value, 60.19, for Restaurant reviews. M-SVC gets the highest TPR value, 68.26, for Amazon reviews, and CB obtains the highest TPR value, 88.70, for Food reviews. This shows that GB, ET, M-SVC, and CB are more capable than other ensembles of identifying the actual positives correctly. M-SVC scores minimum FPR of 0.0 and 12.75 for Uber reviews and Food reviews. In comparison, GB obtains a minimum FPR of 02.06 and 03.12 for Restaurant and Amazon reviews. In addition, M-SVC provides fast execution for small datasets, as it had taken the minimum time (97 ms and 67 ms) to run for Restaurant reviews and Amazon reviews datasets. Still, for the large Food reviews dataset, ET has taken a minimum of 2350 ms for execution. Conclusively, the M-SVC approach provides the highest TPR, lower FPR, and fast performance for text classification. Figure 9a, b, c, and d depicts the combined ROC-AUC score of experimented ensemble models for experimented datasets. It can be seen that Ada-Boost obtains the highest AUC score of 73 and 72 for Uber and Restaurant reviews datasets. Whereas Cat-Boost and Random-Forest score the highest AUC score, 77 for the Amazon reviews dataset. In the case of Food reviews, Meta-Estimator (Linear SVC) archives a higher AUC score of 87 for text classification. Ada-Boost obtains a higher AUC score for two (Uber and Restaurant) review datasets. We can say that Ada-Boost is the best model to classify text reviews. It has also been discovered that Meta-Estimator (Linear SVC) is more capable of classifying the reviews of the large dataset as it outperforms for Food reviews dataset, which stores maximum reviews. Figure 10 and Table 21 depict the weighted precision, weighted recall, and weighted f1-score of all the experimented models for four datasets. The bagging-based Meta-Estimator (Linear SVC) obtains a higher weighted precision value (93% and 87%) for the Uber and Food reviews datasets. The Cat-Boost and Random Forest ensemble achieves (a) ROC-AUC curve for uber reviews.
(b) ROC-AUC curve for restaurant reviews.
(c) ROC-AUC curve for amazon reviews.
(d) ROC-AUC curve for food reviews.

Fig. 9
The combined ROC-AUC curve of ensemble models a higher weighted precision score of 79% for Amazon reviews. At the same time, XGB obtains higher weighted precision of 80% for Restaurant reviews. It means that Meta-Estimator (Linear SVC), Cat-Boost, and Random-Forest ensembles generate low false-positive rates to classify text, respectively. It can be seen that Meta-Estimator (Linear SVC) obtains higher weighted precision, weighted recall, and weighted f1-score of 87% for large Food review datasets, which indicates it is more capable of identifying actual facts and not disturbed by false rates correctly. Extra-Tree gives a higher weighted recall of 72% and a weighted f1-score of 71% for Restaurant reviews, and Random-Forest provides higher weighted precision of 79%, weighted recall 78%, and weighted f1-score 77% for the Amazon reviews dataset. Conclusively, from eight experimented ensemble techniques Meta-Estimator (Linear SVC), Random-Forest generates low false-positive and low falsenegative rates for SA. Furthermore, Meta-Estimator (Linear SVC) is an efficient ensemble model for large and small datasets. Figure 11 depicts the training accuracy of experimented ensemble models for different datasets, and Fig. 12 presents the testing accuracy of tested ensemble models for other datasets. According to training accuracy, Extra-Tree and Random-Forest obtain higher and equal scores of 100% for Uber reviews, 93.37% for Restaurant reviews, 93.62% for Amazon reviews, and 100% for Food reviews.
In testing accuracy, Random-Forest achieves a higher score of 91.82% for Uber reviews, Extra-Tree achieves 71.50% for Restaurant Reviews, Random-Forest and Extra-Tree achieve a higher and equal score of 77.50% for Amazon reviews, and Meta-Estimator (Linear SVC) obtains 86.94% score for Food reviews. In addition, from the boosting concept, XGB receives a higher training accuracy score of 87.62%, 89.50%, and 95.14% for Restaurant, Amazon, and Food reviews datasets. The Cat-Boost ensemble obtains the highest testing accuracy score of 91.07%, 71.00%, 76.00%, and 86.40% for Uber, Restaurant, Amazon, and Food reviews datasets. For Uber reviews, Light-GBM obtains the highest and equal training accuracy of 100% with Random-Forest and Extra-Tree. Conclusively, Cat-Boost achieves better training and testing accuracy than all the boosting techniques but cannot beat the bagging approach's performance as Random-Forest and Extra-Tree outperform over boosting ensemble techniques.
After analyzing the results of all the experimented ensemble techniques according to the different measures, we discovered some important facts regarding the high and low performance of bagging and boosting-based ensemble models for SA using multiple datasets. We conclude the different types and lengths of datasets influence the performance of SA distinctly.
• Gradient-Boosting generates the minimum difference (1. 69%, 4.12%, 6.75%, and 0.41) between training and testing accuracy scores for (Uber, Restaurant, Amazon, and Food) both large and small kinds of datasets, which means it overcomes the problem of overfitting and underfitting and reduces the bias and variance for training the model. • Cat-Boost obtains state-of-the-art results for SA on discriminant datasets. It achieves higher testing accuracy and AUC score for all the experimented datasets. Cat-Boost is very easy to implement and generates competitive results with the help of one-hot encoding. • As we know that Light-GBM is a robust algorithm and capable of handling large datasets but according to our experiments, Light-GBM provides less accuracy and AUC score for text classification than other experimented ensemble techniques. Although Light-GBM produces higher results as 91.88% training accuracy, 85.58% testing accu-(a) The weighted precision score of ensemble techniques for four datasets.
(b) The weighted recall score of ensemble techniques on four datasets.
(c) The weighted f1-score of ensemble techniques on four datasets.

Fig. 10
A comparative weighted precision, weighted recall, and weighted f1-score of ensemble techniques for four datasets racy, and 85 AUC score for the large Food reviews dataset, still unable to beat the performance of Cat-Boost, and XGB. • Cat-Boost and Gradient-Boosting are two main approaches with discriminant frameworks. Apart from it, XGB, Light-GBM, and Cat-Boost follow the framework of Gradient-Boosting. Experiments show Ada-Boost performs better than Gradient-Boosting in training, testing, and AUC scores for all the datasets but is unable to solve overfitting and underfitting problem, generating a higher difference between training and testing accuracy than Gradient-Boosting. • Meta-Estimator with Linear SVC is a bagging-based approach that uses Linear SVC for bagging procedures instead of decision trees. The demonstration shows that Meta-Estimator (Linear-SVC) obtains good results in terms of TPR, FPR, and run-time than all the experimented ensemble techniques, which means it can generate lower false positive and false negative rates and faster execution.
As discussed above, our primary motive was to compare the bagging-based ensemble with the boosting-based ensemble to perform SA. After analyzing the results presented in Table 20 ,Figs. 10,11,12,and 13. We decide that bagging-based ensemble techniques (Random-Forest, Extra-Tree, and Meta-Estimator (Linear SVC)) performed better than boosting-based techniques. Random-Forest and Extra-Tree perform almost equally. Meta-Estimator (Linear SVC) gives less training accuracy and testing accuracy than Extra-Tree and Random-Forest but provides higher speed comparatively. However, XGB and Cat-Boost obtain better accuracy and TPR than other boosting ensembles but cannot beat the performance of bagging-based ensembles. Hence, bagging ensemble-based techniques provide state-of-the-art results for SA. In the introduction part, we have raised some questions regarding the essential aspects and trends of SA.

Research opportunities in SA
SA has gained popularity in various fields, including medicine, politics, industries, and finance. Therefore, researchers are developing various intelligent models for SA. Figure 13 presents the major application areas for SA, where researchers can develop generalized frameworks for real-life applications. Further subsections describe these future opportunities of SA in detail.

SA in medical
SA is gaining popularity in healthcare industries and improving the quality of healthcare services. The opinion and reviews of the patients help healthcare providers to diagnose a particular disease (Abualigah et al. 2020). The COVID-19 outbreak increased the demand for SA in healthcare-related services. SA has been applied to extract the opinion of people towards nation wise lockdown due to the COVID-19 pandemic (Barkur and Vibha 2020 Jun). A novel fusion model has been developed to study the tweets of various coronavirus-affected countries (Basiri et al. 2021 Sep). Medical documents reflect the information of the patients in terms of diagnosis, examinations, observations, and interventions. Judging the medical conditions of the patients in the form of positive and negative responses is required. Several methods have also been developed to handle these kinds of tasks (Denecke and Deng 2015 May 1). Therefore, health care departments needed huge research in the field of SA.

SA in politics
In the current digital world, politics has moved on different levels, and countries' governments use social platforms to extract the people's opinions towards the established laws and policies. SA has been exponentially implemented to know the voice of people. A two-stage model has been developed to predict the results of the election (Ramteke et al. 2016). In the past two years, farmers' protests against three legislation bills passed by the Indian government have shaken the world. Here, artificial intelligence-based SA increased its demand to provide the direction for this democratic dispute (Neogi et al. 2021 Nov 1). A Twitter dataset of the US Fig. 13 Future opportunities in the field of SA presidential election 2016 was collected and applied to the SA to find the choice of people between Hillary Clinton and Donald Trump (Somula et al. 2016). Hence efficient SA models have been required to solve political issues.

SA in industries
SA provides huge support for incremental growth in businesses. Industries use various applications of SA, such as brand monitoring, feedback gathering, the voice of customers (VoC), product analysis, market research, and competitive research. These SA-based applications help industries with decision-making. A novel LSTM-CNN-based model has been developed using a grid search optimization method to find out the opinion of customers for a restaurant (Priyadarshini and Cotton 2021). An automatic brand monitoring framework was proposed using Twitter Romanian data. This model effectively generated the reputation report of a single brand, a comparative report of two different companies, and desired time frame (Istrati et al. 2021).

SA in finance
SA is used to evaluate the financial sector news and helps investors to choose beneficial schemes to invest in. The excessive growth of SA in finance has been seen with the increasing popularity of cryptocurrency. Several cryptocurrencies like Bitcoin, Ethereum, Binance Coin, Quant, Solana, and ZCash are available in the digital finance platform. There is no legislated background available for these cryptocurrencies by which users can faith on them to invest. SA is the only solution that can provide the opinion of different people towards a particular cryptocurrency and helps in decision-making. Machine learning techniques have been used to predict the price movement of Bitcoin, Ethereum, Ripple, and Litecoin cryptocurrencies ). SA has a wide future scope in cryptocurrency price movement predictions. Researchers are taking a keen interest in this field.

Technical discussion
SA is widely adopted in different kinds of tasks initiated, from extracting customer opinion ) toward specific issues to monitoring the patients' mental health based on their posts on social platforms. Additionally, the emergence of new technologies such as Cloud Computing, Big Data (Birjali et al. 2021), Data Science, and Blockchain has widened the field of NLP, including SA. It provided many benefits in the business intelligence domain; companies exploited the SA for customer feedback, product improvement, and marketing strategies (Bernabé-Moreno et al. 2020). SA became a handy tool in cryptocurrency price prediction, Forex prediction, and stock marketing prediction. A recommender system is a model that trains to suggest relevant items (music, movies, or products) to buy. Here, the sentiment analyzer plays a major role in the recommender system for suggesting things (Birjali et al. 2021). SA gathers the opinion of users and feeds the information into the recommender system for final recommendation. Researchers proposed a novel adaptive learning model based on social platform analysis and showed how SA and Big Data could transform e-learning platforms. Furthermore, in government policies and other similar issues, SA is very helpful in monitoring possible public reactions. In the past few years, Twitter has been utilized to analyze the opinion of people toward the global COVID-19 pandemic. SA has been adopted to observe the government strategies (Alaoui et al. 2018), people's reactions, and World Health Organization (WHO) policies as a preventive measure to fight against COVID-19. The Healthcare domain is taking so much interest in SA recently. This allows medical actors to extract information about drug reactions, disease diagnosis, epidemics, and patient moods (Ramírez-Tinoco et al. 2019;Tiwari et al. 2021).
Machine learning is the most promising approach for SA. Usually, machine learningbased SA provides a high accuracy score than the lexicon-based approach.. It offers various feature engineering techniques that extract the critical features from the dataset and improve the efficiency of SA. A graph-based approach connects interrelated words in text reviews to calculate the sentiment and opinion of people where vertices and nodes conform to features available in reviews. Various graph-based methods and algorithms have been applied in the last decades to solve the problem of SA (Tiwari and Kumar 2020;Bhati and Rai 2021). SA improves the classification problem by reducing poor and unfortunate selection. It has the capability and knowledge of various learners, which increases the accuracy of a classification and decreases errors in prediction. The hybrid approach utilizes the capability of various approaches such as rule-based, lexicon-based, machine learning, or deep learning-based. It enhances the efficiency of the SA model with optimum results. It is an idea that generates in a researcher's mind to develop the best approach for a particular task.
The journals "Elsevier" and "Springer" are two more common venues for SA publications. Where Elsevier, Springer, and ACM are three popular publishers that are chosen by researchers for SA-related authentic research. A benchmark dataset plays a vital role in sound research. It has been seen in the graph that researchers more frequently use product reviews for their experiments. Secondly, Twitter gained more popularity among researchers for SA-related experiments. Few researchers also generate their datasets for sentiment classification. Whereas the researchers also consider movie reviews, medical reviews, and hotel reviews for SA-related experiments. SA is an emerging field, but it has various challenges, making it process-critical and decreasing the efficiency of related models. Although researchers are working to solve these issues using discriminant techniques, there is still a lack of accuracy. These challenges generate obstacles to extracting the correct meaning of sentiments and classifying the correct polarity. The number of N features increases the domain dimensionality of the datasets. Feature engineering is a very important step in SA applications and opinion mining. Feature selection and feature extraction should be intractable with final processing in optimal feature engineering. Table 22 summarizes the response of the studies addressing each research question.

Conclusions and future work
This article presents an immense literature survey of 92 reputed articles, which includes lexicon-based, graph-based, machine learning-based, ensemble-based, and hybrid-based techniques for SA. It is observed that ensemble-based and hybrid-based techniques gained more popularity for text classification. In addition, essential aspects such as frequently used SA datasets, publishing platforms, proposed techniques, SA challenges, SA feature-engineering techniques, and various emotion theories are also discussed in this study. With the  Joshi et al. 2015;Farooq et al. 2017;Gautam et al. 2018;Pan et al. 2010;Peng et al. 2018;Chunning et al. 2020;Hong and Hatzivassiloglou 2003;Minqing and Liu 2004;Ge et al. 1998;Lappin and Leass 1994) RQ-3 The number of N features increases the domain dimensionality of the datasets. Feature engineering is a very important step in SA applications and opinion mining. Feature selection and feature extraction should be intractable with final processing in optimal feature engineering. Dimensionality reduction reduces the high dimensions of the dataset, which keeps more discriminative and constructive features from the collection set. Feature engineering is categorized into two major parts (1) Feature Extraction and (2) Feature Selection. Feature extraction is a process of selecting required or essential features from the original set. Principal Component Analysis (PCA) and Latent Semantic Analysis (LSA) are the two popular techniques of feature extraction. At the same time, feature selection is a process that reduces the number of variables for predictive models. Effective and efficient feature selection improves the performance of SA. The feature selection process includes missing values removal, low variance removal, highly correlated feature removal, univariate selection, and recursive elimination. Feature selection methods are categorized into two groups: filter methods and wrapper methods that are utilized for effective

Research question Research findings
References RQ-4 Emotion extraction and classification are essential parts of SA. So, here in this survey, the major emotions are concluded that are considered by the various researchers in SA during facial expression, gesture presentation, motion, and voice recognition. Here, a standard emotion set has been that is common in various types of research. Automatic human facial expression extraction is an emerging application of Human-Computer Interaction (HCI) and affective computing. Therefore, emotion extraction and classification became prime aspects in the research field of SA. Several researchers have been working on a distinctive set of emotions and expressions Gunesa and Piccardi 2005;Gunesb and Piccardi 2008;Hablani et al. 2013;Chen et al. 2013;Hayat and Bennamoun 2014) RQ-5 The ensemble approach of machine learning has been used in various applications and has produced outstanding results. Ensemble learning is also approachable in the SA task. Therefore, we have presented a comparative analysis of diverse ensemble methods that are divided into two main categories: bagging and boosting. This study compares eight popular ensemble learners (Random-Forest, Extra-Tree, Meta-Estimator (Linear SVC), Ada-Boost, Gradient-Boosting, XGB, Cat-Boost, and Light-GBM) to choose the best model for SA. The experiments have been conducted on four different domain reviews: Uber reviews, Restaurant reviews, Amazon reviews, and Food reviews. After analyzing the comparative results presented in Sect. 5, It is stated that bagging-based ensemble techniques (Random-Forest, Extra-Tree, and Meta-Estimator (Linear SVC)) performed better than boosting-based techniques. Random-Forest and Extra-Tree perform almost equally. Meta-Estimator (Linear SVC) gives less training accuracy and testing accuracy than Extra-Tree and Random-Forest but provides higher speed comparatively. However, XGB and Cat-Boost obtain better accuracy and TPR than other boosting ensembles but cannot beat the performance of bagging-based ensembles. Hence, bagging ensemble-based techniques provide state-of- rapid demand for SA, several challenges are also occurred in processing the text reviews. So, we discussed several SA-related challenges, namely stance-detection, sarcasm-detection, negation-handling, domain-dependence, huge-lexicon, word sense disambiguation, and anaphora resolution, with their proposed solutions. The feature-engineering is a prime factor for effective text classification. Here we convey an extensive taxonomy of featureengineering techniques used for text processing. The emotion theory of five admired researchers has also been discussed. The essence of their idea represents disgust, fear, anger, happiness, and sadness, mainly included in basic emotion classification from the text information.
Our primary objective is to provide great relevance to the companies for selecting a better sentiment model for their brand monitoring and product reviews. This article also implemented numerous ensemble-based techniques on different domain reviews datasets, providing a systematic comparative analysis of bagging and boosting-based ensemble for SA. We have also illustrated the core of ensemble-based techniques for SA. Five boostingbased ensembles and three bagging-based ensemble techniques have been implemented on four text review datasets to conduct extensive experiments. The previously discussed ensemble-based research, incorporated with experimented results, provides a broad perspective to apply ensemble-based techniques for SA. Finally, experimental results demonstrate that bagging-based ensemble techniques outperform in terms of TPR, FPR, accuracy, weighted precision, weighted recall, weighted f1-score, AUC-score, and run-time for SA. However, XGB and Cat-Boost from boosting approach produced effective results but were unable to beat the performance of bagging-based ensembles. This survey with an analytical study will help in determining the best technique for preparing SA-related applications. For future contributions, we will explore the hybrid approaches, where discriminant techniques and models are combined to develop a better model for SA with reduced computational cost. The goal is to develop a hybrid model for SA application with a combination of different approaches. Therefore, we will assess the effectiveness and reliability of the hybrid methods with different types of parameters.
Funding Open Access funding enabled and organized by CAUL and its Member Institutions.

Declarations
Conflict of interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.